Steps 1. Given the formulated question from the assignment description, you will now conduct EDA Checklist items 2-4. First, download 2004 and 2019 data for all sites in California from the EPA Air Quality Data website. Read in the data using data.table(). For each of the two datasets, check the dimensions, headers, footers, variable names and variable types. Check for any data issues, particularly in the key variable we are analyzing. Make sure you write up a summary of all of your findings.
Combine the two years of data into one data frame. Use the Date variable to create a new column for year, which will serve as an identifier. Change the names of the key variables so that they are easier to refer to in your code.
Create a basic map in leaflet() that shows the locations of the sites (make sure to use different colors for each year). Summarize the spatial distribution of the monitoring sites.
Check for any missing or implausible values of PM in the combined dataset. Explore the proportions of each and provide a summary of any temporal patterns you see in these observations.
Explore the main question of interest at three different spatial levels. Create exploratory plots (e.g. boxplots, histograms, line plots) and summary statistics that best suit each level of data. Be sure to write up explanations of what you observe in these data. state county site in Los Angeles
#Install only if we don't have the package
if(!require(data.table)){
install.packages("data.table")
}
#load required package
library(data.table)
#Read in the data
data04 <- data.table::fread("Data04.csv")
data19 <- data.table::fread("Data19.csv")
For each of the two datasets, check the dimensions, headers, footers, variable names and variable types
#2004
dim(data04)
## [1] 19233 20
head(data04)
## Date Source Site ID POC Daily Mean PM2.5 Concentration UNITS
## 1: 01/01/2004 AQS 60010007 1 8.9 ug/m3 LC
## 2: 01/02/2004 AQS 60010007 1 12.2 ug/m3 LC
## 3: 01/03/2004 AQS 60010007 1 16.5 ug/m3 LC
## 4: 01/04/2004 AQS 60010007 1 19.5 ug/m3 LC
## 5: 01/05/2004 AQS 60010007 1 11.5 ug/m3 LC
## 6: 01/06/2004 AQS 60010007 1 32.5 ug/m3 LC
## DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
## 1: 37 Livermore 1 100
## 2: 51 Livermore 1 100
## 3: 60 Livermore 1 100
## 4: 67 Livermore 1 100
## 5: 48 Livermore 1 100
## 6: 94 Livermore 1 100
## AQS_PARAMETER_CODE AQS_PARAMETER_DESC CBSA_CODE
## 1: 88101 PM2.5 - Local Conditions 41860
## 2: 88502 Acceptable PM2.5 AQI & Speciation Mass 41860
## 3: 88502 Acceptable PM2.5 AQI & Speciation Mass 41860
## 4: 88502 Acceptable PM2.5 AQI & Speciation Mass 41860
## 5: 88502 Acceptable PM2.5 AQI & Speciation Mass 41860
## 6: 88502 Acceptable PM2.5 AQI & Speciation Mass 41860
## CBSA_NAME STATE_CODE STATE COUNTY_CODE COUNTY
## 1: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## 2: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## 3: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## 4: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## 5: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## 6: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## SITE_LATITUDE SITE_LONGITUDE
## 1: 37.68753 -121.7842
## 2: 37.68753 -121.7842
## 3: 37.68753 -121.7842
## 4: 37.68753 -121.7842
## 5: 37.68753 -121.7842
## 6: 37.68753 -121.7842
tail(data04)
## Date Source Site ID POC Daily Mean PM2.5 Concentration UNITS
## 1: 12/14/2004 AQS 61131003 1 11 ug/m3 LC
## 2: 12/17/2004 AQS 61131003 1 16 ug/m3 LC
## 3: 12/20/2004 AQS 61131003 1 17 ug/m3 LC
## 4: 12/23/2004 AQS 61131003 1 9 ug/m3 LC
## 5: 12/26/2004 AQS 61131003 1 24 ug/m3 LC
## 6: 12/29/2004 AQS 61131003 1 9 ug/m3 LC
## DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
## 1: 46 Woodland-Gibson Road 1 100
## 2: 59 Woodland-Gibson Road 1 100
## 3: 61 Woodland-Gibson Road 1 100
## 4: 38 Woodland-Gibson Road 1 100
## 5: 76 Woodland-Gibson Road 1 100
## 6: 38 Woodland-Gibson Road 1 100
## AQS_PARAMETER_CODE AQS_PARAMETER_DESC CBSA_CODE
## 1: 88101 PM2.5 - Local Conditions 40900
## 2: 88101 PM2.5 - Local Conditions 40900
## 3: 88101 PM2.5 - Local Conditions 40900
## 4: 88101 PM2.5 - Local Conditions 40900
## 5: 88101 PM2.5 - Local Conditions 40900
## 6: 88101 PM2.5 - Local Conditions 40900
## CBSA_NAME STATE_CODE STATE COUNTY_CODE
## 1: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## 2: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## 3: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## 4: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## 5: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## 6: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## COUNTY SITE_LATITUDE SITE_LONGITUDE
## 1: Yolo 38.66121 -121.7327
## 2: Yolo 38.66121 -121.7327
## 3: Yolo 38.66121 -121.7327
## 4: Yolo 38.66121 -121.7327
## 5: Yolo 38.66121 -121.7327
## 6: Yolo 38.66121 -121.7327
str(data04)
## Classes 'data.table' and 'data.frame': 19233 obs. of 20 variables:
## $ Date : chr "01/01/2004" "01/02/2004" "01/03/2004" "01/04/2004" ...
## $ Source : chr "AQS" "AQS" "AQS" "AQS" ...
## $ Site ID : int 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
## $ POC : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Daily Mean PM2.5 Concentration: num 8.9 12.2 16.5 19.5 11.5 32.5 14 29.9 21 15.7 ...
## $ UNITS : chr "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
## $ DAILY_AQI_VALUE : int 37 51 60 67 48 94 55 88 70 59 ...
## $ Site Name : chr "Livermore" "Livermore" "Livermore" "Livermore" ...
## $ DAILY_OBS_COUNT : int 1 1 1 1 1 1 1 1 1 1 ...
## $ PERCENT_COMPLETE : num 100 100 100 100 100 100 100 100 100 100 ...
## $ AQS_PARAMETER_CODE : int 88101 88502 88502 88502 88502 88502 88101 88502 88502 88101 ...
## $ AQS_PARAMETER_DESC : chr "PM2.5 - Local Conditions" "Acceptable PM2.5 AQI & Speciation Mass" "Acceptable PM2.5 AQI & Speciation Mass" "Acceptable PM2.5 AQI & Speciation Mass" ...
## $ CBSA_CODE : int 41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
## $ CBSA_NAME : chr "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
## $ STATE_CODE : int 6 6 6 6 6 6 6 6 6 6 ...
## $ STATE : chr "California" "California" "California" "California" ...
## $ COUNTY_CODE : int 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY : chr "Alameda" "Alameda" "Alameda" "Alameda" ...
## $ SITE_LATITUDE : num 37.7 37.7 37.7 37.7 37.7 ...
## $ SITE_LONGITUDE : num -122 -122 -122 -122 -122 ...
## - attr(*, ".internal.selfref")=<externalptr>
#2019
dim(data19)
## [1] 53086 20
head(data19)
## Date Source Site ID POC Daily Mean PM2.5 Concentration UNITS
## 1: 01/01/2019 AQS 60010007 3 5.7 ug/m3 LC
## 2: 01/02/2019 AQS 60010007 3 11.9 ug/m3 LC
## 3: 01/03/2019 AQS 60010007 3 20.1 ug/m3 LC
## 4: 01/04/2019 AQS 60010007 3 28.8 ug/m3 LC
## 5: 01/05/2019 AQS 60010007 3 11.2 ug/m3 LC
## 6: 01/06/2019 AQS 60010007 3 2.7 ug/m3 LC
## DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
## 1: 24 Livermore 1 100
## 2: 50 Livermore 1 100
## 3: 68 Livermore 1 100
## 4: 86 Livermore 1 100
## 5: 47 Livermore 1 100
## 6: 11 Livermore 1 100
## AQS_PARAMETER_CODE AQS_PARAMETER_DESC CBSA_CODE
## 1: 88101 PM2.5 - Local Conditions 41860
## 2: 88101 PM2.5 - Local Conditions 41860
## 3: 88101 PM2.5 - Local Conditions 41860
## 4: 88101 PM2.5 - Local Conditions 41860
## 5: 88101 PM2.5 - Local Conditions 41860
## 6: 88101 PM2.5 - Local Conditions 41860
## CBSA_NAME STATE_CODE STATE COUNTY_CODE COUNTY
## 1: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## 2: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## 3: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## 4: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## 5: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## 6: San Francisco-Oakland-Hayward, CA 6 California 1 Alameda
## SITE_LATITUDE SITE_LONGITUDE
## 1: 37.68753 -121.7842
## 2: 37.68753 -121.7842
## 3: 37.68753 -121.7842
## 4: 37.68753 -121.7842
## 5: 37.68753 -121.7842
## 6: 37.68753 -121.7842
tail(data19)
## Date Source Site ID POC Daily Mean PM2.5 Concentration UNITS
## 1: 11/11/2019 AQS 61131003 1 13.5 ug/m3 LC
## 2: 11/17/2019 AQS 61131003 1 18.1 ug/m3 LC
## 3: 11/29/2019 AQS 61131003 1 12.5 ug/m3 LC
## 4: 12/17/2019 AQS 61131003 1 23.8 ug/m3 LC
## 5: 12/23/2019 AQS 61131003 1 1.0 ug/m3 LC
## 6: 12/29/2019 AQS 61131003 1 9.1 ug/m3 LC
## DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
## 1: 54 Woodland-Gibson Road 1 100
## 2: 64 Woodland-Gibson Road 1 100
## 3: 52 Woodland-Gibson Road 1 100
## 4: 76 Woodland-Gibson Road 1 100
## 5: 4 Woodland-Gibson Road 1 100
## 6: 38 Woodland-Gibson Road 1 100
## AQS_PARAMETER_CODE AQS_PARAMETER_DESC CBSA_CODE
## 1: 88101 PM2.5 - Local Conditions 40900
## 2: 88101 PM2.5 - Local Conditions 40900
## 3: 88101 PM2.5 - Local Conditions 40900
## 4: 88101 PM2.5 - Local Conditions 40900
## 5: 88101 PM2.5 - Local Conditions 40900
## 6: 88101 PM2.5 - Local Conditions 40900
## CBSA_NAME STATE_CODE STATE COUNTY_CODE
## 1: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## 2: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## 3: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## 4: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## 5: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## 6: Sacramento--Roseville--Arden-Arcade, CA 6 California 113
## COUNTY SITE_LATITUDE SITE_LONGITUDE
## 1: Yolo 38.66121 -121.7327
## 2: Yolo 38.66121 -121.7327
## 3: Yolo 38.66121 -121.7327
## 4: Yolo 38.66121 -121.7327
## 5: Yolo 38.66121 -121.7327
## 6: Yolo 38.66121 -121.7327
str(data19)
## Classes 'data.table' and 'data.frame': 53086 obs. of 20 variables:
## $ Date : chr "01/01/2019" "01/02/2019" "01/03/2019" "01/04/2019" ...
## $ Source : chr "AQS" "AQS" "AQS" "AQS" ...
## $ Site ID : int 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
## $ POC : int 3 3 3 3 3 3 3 3 3 3 ...
## $ Daily Mean PM2.5 Concentration: num 5.7 11.9 20.1 28.8 11.2 2.7 2.8 7 3.1 7.1 ...
## $ UNITS : chr "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
## $ DAILY_AQI_VALUE : int 24 50 68 86 47 11 12 29 13 30 ...
## $ Site Name : chr "Livermore" "Livermore" "Livermore" "Livermore" ...
## $ DAILY_OBS_COUNT : int 1 1 1 1 1 1 1 1 1 1 ...
## $ PERCENT_COMPLETE : num 100 100 100 100 100 100 100 100 100 100 ...
## $ AQS_PARAMETER_CODE : int 88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
## $ AQS_PARAMETER_DESC : chr "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
## $ CBSA_CODE : int 41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
## $ CBSA_NAME : chr "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
## $ STATE_CODE : int 6 6 6 6 6 6 6 6 6 6 ...
## $ STATE : chr "California" "California" "California" "California" ...
## $ COUNTY_CODE : int 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY : chr "Alameda" "Alameda" "Alameda" "Alameda" ...
## $ SITE_LATITUDE : num 37.7 37.7 37.7 37.7 37.7 ...
## $ SITE_LONGITUDE : num -122 -122 -122 -122 -122 ...
## - attr(*, ".internal.selfref")=<externalptr>
Check for any data issues, particularly in the key variable we are analyzing. Make sure you write up a summary of all of your findings.
#2004
table(data04$Date)
##
## 01/01/2004 01/02/2004 01/03/2004 01/04/2004 01/05/2004 01/06/2004 01/07/2004
## 80 23 22 108 19 22 89
## 01/08/2004 01/09/2004 01/10/2004 01/11/2004 01/12/2004 01/13/2004 01/14/2004
## 26 25 128 25 24 96 23
## 01/15/2004 01/16/2004 01/17/2004 01/18/2004 01/19/2004 01/20/2004 01/21/2004
## 23 116 25 26 80 20 22
## 01/22/2004 01/23/2004 01/24/2004 01/25/2004 01/26/2004 01/27/2004 01/28/2004
## 123 23 24 88 23 24 122
## 01/29/2004 01/30/2004 01/31/2004 02/01/2004 02/02/2004 02/03/2004 02/04/2004
## 28 26 90 24 23 121 20
## 02/05/2004 02/06/2004 02/07/2004 02/08/2004 02/09/2004 02/10/2004 02/11/2004
## 23 82 25 24 120 21 25
## 02/12/2004 02/13/2004 02/14/2004 02/15/2004 02/16/2004 02/17/2004 02/18/2004
## 86 23 25 120 22 21 83
## 02/19/2004 02/20/2004 02/21/2004 02/22/2004 02/23/2004 02/24/2004 02/25/2004
## 21 23 122 24 23 91 22
## 02/26/2004 02/27/2004 02/28/2004 02/29/2004 03/01/2004 03/02/2004 03/03/2004
## 24 115 26 25 81 23 24
## 03/04/2004 03/05/2004 03/06/2004 03/07/2004 03/08/2004 03/09/2004 03/10/2004
## 128 25 26 91 24 24 117
## 03/11/2004 03/12/2004 03/13/2004 03/14/2004 03/15/2004 03/16/2004 03/17/2004
## 21 21 86 24 22 122 21
## 03/18/2004 03/19/2004 03/20/2004 03/21/2004 03/22/2004 03/23/2004 03/24/2004
## 25 92 25 25 124 23 24
## 03/25/2004 03/26/2004 03/27/2004 03/28/2004 03/29/2004 03/30/2004 03/31/2004
## 91 24 23 127 23 22 86
## 04/01/2004 04/02/2004 04/03/2004 04/04/2004 04/05/2004 04/06/2004 04/07/2004
## 19 20 123 18 16 84 19
## 04/08/2004 04/09/2004 04/10/2004 04/11/2004 04/12/2004 04/13/2004 04/14/2004
## 21 117 21 21 71 18 17
## 04/15/2004 04/16/2004 04/17/2004 04/18/2004 04/19/2004 04/20/2004 04/21/2004
## 123 17 18 79 17 19 125
## 04/22/2004 04/23/2004 04/24/2004 04/25/2004 04/26/2004 04/27/2004 04/28/2004
## 20 19 82 20 20 126 21
## 04/29/2004 04/30/2004 05/01/2004 05/02/2004 05/03/2004 05/04/2004 05/05/2004
## 20 83 19 20 123 17 18
## 05/06/2004 05/07/2004 05/08/2004 05/09/2004 05/10/2004 05/11/2004 05/12/2004
## 82 19 19 126 18 19 85
## 05/13/2004 05/14/2004 05/15/2004 05/16/2004 05/17/2004 05/18/2004 05/19/2004
## 21 20 125 20 18 86 18
## 05/20/2004 05/21/2004 05/22/2004 05/23/2004 05/24/2004 05/25/2004 05/26/2004
## 17 120 22 20 74 20 23
## 05/27/2004 05/28/2004 05/29/2004 05/30/2004 05/31/2004 06/01/2004 06/02/2004
## 125 20 22 82 22 21 130
## 06/03/2004 06/04/2004 06/05/2004 06/06/2004 06/07/2004 06/08/2004 06/09/2004
## 22 21 88 22 21 127 22
## 06/10/2004 06/11/2004 06/12/2004 06/13/2004 06/14/2004 06/15/2004 06/16/2004
## 21 82 22 22 113 20 26
## 06/17/2004 06/18/2004 06/19/2004 06/20/2004 06/21/2004 06/22/2004 06/23/2004
## 91 25 28 133 26 26 92
## 06/24/2004 06/25/2004 06/26/2004 06/27/2004 06/28/2004 06/29/2004 06/30/2004
## 31 27 138 28 28 95 31
## 07/01/2004 07/02/2004 07/03/2004 07/04/2004 07/05/2004 07/06/2004 07/07/2004
## 27 124 29 27 87 26 28
## 07/08/2004 07/09/2004 07/10/2004 07/11/2004 07/12/2004 07/13/2004 07/14/2004
## 130 25 26 93 23 26 130
## 07/15/2004 07/16/2004 07/17/2004 07/18/2004 07/19/2004 07/20/2004 07/21/2004
## 29 26 91 25 25 129 28
## 07/22/2004 07/23/2004 07/24/2004 07/25/2004 07/26/2004 07/27/2004 07/28/2004
## 26 85 29 27 128 23 24
## 07/29/2004 07/30/2004 07/31/2004 08/01/2004 08/02/2004 08/03/2004 08/04/2004
## 80 22 23 121 22 26 87
## 08/05/2004 08/06/2004 08/07/2004 08/08/2004 08/09/2004 08/10/2004 08/11/2004
## 25 25 128 24 24 89 25
## 08/12/2004 08/13/2004 08/14/2004 08/15/2004 08/16/2004 08/17/2004 08/18/2004
## 23 114 24 25 83 24 25
## 08/19/2004 08/20/2004 08/21/2004 08/22/2004 08/23/2004 08/24/2004 08/25/2004
## 124 26 27 86 24 26 131
## 08/26/2004 08/27/2004 08/28/2004 08/29/2004 08/30/2004 08/31/2004 09/01/2004
## 24 27 87 28 28 122 27
## 09/02/2004 09/03/2004 09/04/2004 09/05/2004 09/06/2004 09/07/2004 09/08/2004
## 26 89 27 27 128 25 27
## 09/09/2004 09/10/2004 09/11/2004 09/12/2004 09/13/2004 09/14/2004 09/15/2004
## 84 25 27 134 28 27 89
## 09/16/2004 09/17/2004 09/18/2004 09/19/2004 09/20/2004 09/21/2004 09/22/2004
## 28 26 136 27 26 93 26
## 09/23/2004 09/24/2004 09/25/2004 09/26/2004 09/27/2004 09/28/2004 09/29/2004
## 27 126 26 27 86 25 30
## 09/30/2004 10/01/2004 10/02/2004 10/03/2004 10/04/2004 10/05/2004 10/06/2004
## 126 29 29 98 31 30 129
## 10/07/2004 10/08/2004 10/09/2004 10/10/2004 10/11/2004 10/12/2004 10/13/2004
## 34 33 100 32 31 129 32
## 10/14/2004 10/15/2004 10/16/2004 10/17/2004 10/18/2004 10/19/2004 10/20/2004
## 33 99 34 32 124 32 31
## 10/21/2004 10/22/2004 10/23/2004 10/24/2004 10/25/2004 10/26/2004 10/27/2004
## 95 29 30 127 29 31 90
## 10/28/2004 10/29/2004 10/30/2004 10/31/2004 11/01/2004 11/02/2004 11/03/2004
## 32 30 133 31 28 91 33
## 11/04/2004 11/05/2004 11/06/2004 11/07/2004 11/08/2004 11/09/2004 11/10/2004
## 30 117 34 30 95 31 31
## 11/11/2004 11/12/2004 11/13/2004 11/14/2004 11/15/2004 11/16/2004 11/17/2004
## 136 31 31 99 31 33 127
## 11/18/2004 11/19/2004 11/20/2004 11/21/2004 11/22/2004 11/23/2004 11/24/2004
## 32 33 102 32 32 129 31
## 11/25/2004 11/26/2004 11/27/2004 11/28/2004 11/29/2004 11/30/2004 12/01/2004
## 33 99 31 30 121 30 31
## 12/02/2004 12/03/2004 12/04/2004 12/05/2004 12/06/2004 12/07/2004 12/08/2004
## 92 29 34 125 31 33 91
## 12/09/2004 12/10/2004 12/11/2004 12/12/2004 12/13/2004 12/14/2004 12/15/2004
## 34 32 131 30 30 106 31
## 12/16/2004 12/17/2004 12/18/2004 12/19/2004 12/20/2004 12/21/2004 12/22/2004
## 30 115 33 31 95 32 29
## 12/23/2004 12/24/2004 12/25/2004 12/26/2004 12/27/2004 12/28/2004 12/29/2004
## 130 30 33 90 31 32 129
## 12/30/2004 12/31/2004
## 32 35
table(data04$UNITS)
##
## ug/m3 LC
## 19233
table(data04$STATE)
##
## California
## 19233
summary(data04$`Daily Mean PM2.5 Concentration`)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.10 6.00 10.10 13.13 16.30 251.00
data04[`Daily Mean PM2.5 Concentration`<0][order(`Daily Mean PM2.5 Concentration`)]
## Date Source Site ID POC Daily Mean PM2.5 Concentration UNITS
## 1: 12/08/2004 AQS 60199000 1 -0.1 ug/m3 LC
## DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
## 1: 0 Kaiser Wilderness 1 100
## AQS_PARAMETER_CODE AQS_PARAMETER_DESC CBSA_CODE
## 1: 88502 Acceptable PM2.5 AQI & Speciation Mass 23420
## CBSA_NAME STATE_CODE STATE COUNTY_CODE COUNTY SITE_LATITUDE
## 1: Fresno, CA 6 California 19 Fresno 37.22064
## SITE_LONGITUDE
## 1: -119.1556
data04rm <- data04[`Daily Mean PM2.5 Concentration`>=0]
mean(is.na(data04$`Daily Mean PM2.5 Concentration`))
## [1] 0
summary(data04$SITE_LATITUDE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 32.63 34.07 36.48 36.23 38.10 41.71
summary(data04$SITE_LONGITUDE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -124.2 -121.6 -119.3 -119.7 -117.9 -115.5
#2019
table(data19$Date)
##
## 01/01/2019 01/02/2019 01/03/2019 01/04/2019 01/05/2019 01/06/2019 01/07/2019
## 124 116 180 124 123 152 121
## 01/08/2019 01/09/2019 01/10/2019 01/11/2019 01/12/2019 01/13/2019 01/14/2019
## 119 185 120 116 156 121 121
## 01/15/2019 01/16/2019 01/17/2019 01/18/2019 01/19/2019 01/20/2019 01/21/2019
## 183 124 124 155 124 121 187
## 01/22/2019 01/23/2019 01/24/2019 01/25/2019 01/26/2019 01/27/2019 01/28/2019
## 118 120 158 123 123 187 121
## 01/29/2019 01/30/2019 01/31/2019 02/01/2019 02/02/2019 02/03/2019 02/04/2019
## 123 166 123 125 197 122 122
## 02/05/2019 02/06/2019 02/07/2019 02/08/2019 02/09/2019 02/10/2019 02/11/2019
## 164 122 123 185 127 124 166
## 02/12/2019 02/13/2019 02/14/2019 02/15/2019 02/16/2019 02/17/2019 02/18/2019
## 125 126 198 126 128 168 125
## 02/19/2019 02/20/2019 02/21/2019 02/22/2019 02/23/2019 02/24/2019 02/25/2019
## 127 185 123 124 170 124 123
## 02/26/2019 02/27/2019 02/28/2019 03/01/2019 03/02/2019 03/03/2019 03/04/2019
## 197 123 120 168 127 129 198
## 03/05/2019 03/06/2019 03/07/2019 03/08/2019 03/09/2019 03/10/2019 03/11/2019
## 125 126 165 126 127 203 127
## 03/12/2019 03/13/2019 03/14/2019 03/15/2019 03/16/2019 03/17/2019 03/18/2019
## 131 174 125 123 192 124 122
## 03/19/2019 03/20/2019 03/21/2019 03/22/2019 03/23/2019 03/24/2019 03/25/2019
## 176 126 125 204 130 128 172
## 03/26/2019 03/27/2019 03/28/2019 03/29/2019 03/30/2019 03/31/2019 04/01/2019
## 130 130 197 129 129 177 130
## 04/02/2019 04/03/2019 04/04/2019 04/05/2019 04/06/2019 04/07/2019 04/08/2019
## 126 205 132 130 176 131 128
## 04/09/2019 04/10/2019 04/11/2019 04/12/2019 04/13/2019 04/14/2019 04/15/2019
## 195 127 123 170 130 130 205
## 04/16/2019 04/17/2019 04/18/2019 04/19/2019 04/20/2019 04/21/2019 04/22/2019
## 130 130 175 131 130 202 133
## 04/23/2019 04/24/2019 04/25/2019 04/26/2019 04/27/2019 04/28/2019 04/29/2019
## 133 174 130 127 208 125 125
## 04/30/2019 05/01/2019 05/02/2019 05/03/2019 05/04/2019 05/05/2019 05/06/2019
## 172 128 127 193 127 127 176
## 05/07/2019 05/08/2019 05/09/2019 05/10/2019 05/11/2019 05/12/2019 05/13/2019
## 128 127 206 126 127 174 122
## 05/14/2019 05/15/2019 05/16/2019 05/17/2019 05/18/2019 05/19/2019 05/20/2019
## 125 198 128 125 176 127 127
## 05/21/2019 05/22/2019 05/23/2019 05/24/2019 05/25/2019 05/26/2019 05/27/2019
## 208 128 129 178 131 130 203
## 05/28/2019 05/29/2019 05/30/2019 05/31/2019 06/01/2019 06/02/2019 06/03/2019
## 128 129 174 127 129 209 127
## 06/04/2019 06/05/2019 06/06/2019 06/07/2019 06/08/2019 06/09/2019 06/10/2019
## 127 175 127 129 198 127 124
## 06/11/2019 06/12/2019 06/13/2019 06/14/2019 06/15/2019 06/16/2019 06/17/2019
## 176 131 129 205 136 131 174
## 06/18/2019 06/19/2019 06/20/2019 06/21/2019 06/22/2019 06/23/2019 06/24/2019
## 131 129 195 125 128 172 128
## 06/25/2019 06/26/2019 06/27/2019 06/28/2019 06/29/2019 06/30/2019 07/01/2019
## 130 206 130 131 177 131 131
## 07/02/2019 07/03/2019 07/04/2019 07/05/2019 07/06/2019 07/07/2019 07/08/2019
## 202 132 132 181 133 132 207
## 07/09/2019 07/10/2019 07/11/2019 07/12/2019 07/13/2019 07/14/2019 07/15/2019
## 131 131 178 130 131 205 127
## 07/16/2019 07/17/2019 07/18/2019 07/19/2019 07/20/2019 07/21/2019 07/22/2019
## 131 178 129 129 213 130 125
## 07/23/2019 07/24/2019 07/25/2019 07/26/2019 07/27/2019 07/28/2019 07/29/2019
## 174 126 127 198 130 130 172
## 07/30/2019 07/31/2019 08/01/2019 08/02/2019 08/03/2019 08/04/2019 08/05/2019
## 133 129 205 128 129 177 124
## 08/06/2019 08/07/2019 08/08/2019 08/09/2019 08/10/2019 08/11/2019 08/12/2019
## 128 197 125 128 173 129 127
## 08/13/2019 08/14/2019 08/15/2019 08/16/2019 08/17/2019 08/18/2019 08/19/2019
## 207 128 130 171 128 127 199
## 08/20/2019 08/21/2019 08/22/2019 08/23/2019 08/24/2019 08/25/2019 08/26/2019
## 127 130 167 130 126 204 122
## 08/27/2019 08/28/2019 08/29/2019 08/30/2019 08/31/2019 09/01/2019 09/02/2019
## 127 170 124 128 193 129 127
## 09/03/2019 09/04/2019 09/05/2019 09/06/2019 09/07/2019 09/08/2019 09/09/2019
## 166 123 126 207 128 125 170
## 09/10/2019 09/11/2019 09/12/2019 09/13/2019 09/14/2019 09/15/2019 09/16/2019
## 125 123 198 129 126 174 120
## 09/17/2019 09/18/2019 09/19/2019 09/20/2019 09/21/2019 09/22/2019 09/23/2019
## 121 203 123 119 172 123 124
## 09/24/2019 09/25/2019 09/26/2019 09/27/2019 09/28/2019 09/29/2019 09/30/2019
## 195 124 127 177 128 127 205
## 10/01/2019 10/02/2019 10/03/2019 10/04/2019 10/05/2019 10/06/2019 10/07/2019
## 126 126 171 128 126 196 125
## 10/08/2019 10/09/2019 10/10/2019 10/11/2019 10/12/2019 10/13/2019 10/14/2019
## 126 164 117 122 201 124 123
## 10/15/2019 10/16/2019 10/17/2019 10/18/2019 10/19/2019 10/20/2019 10/21/2019
## 175 128 125 189 124 126 169
## 10/22/2019 10/23/2019 10/24/2019 10/25/2019 10/26/2019 10/27/2019 10/28/2019
## 124 124 200 127 125 153 112
## 10/29/2019 10/30/2019 10/31/2019 11/01/2019 11/02/2019 11/03/2019 11/04/2019
## 118 177 123 128 176 129 127
## 11/05/2019 11/06/2019 11/07/2019 11/08/2019 11/09/2019 11/10/2019 11/11/2019
## 201 128 127 174 128 129 196
## 11/12/2019 11/13/2019 11/14/2019 11/15/2019 11/16/2019 11/17/2019 11/18/2019
## 127 126 177 124 124 202 123
## 11/19/2019 11/20/2019 11/21/2019 11/22/2019 11/23/2019 11/24/2019 11/25/2019
## 125 165 126 128 193 127 126
## 11/26/2019 11/27/2019 11/28/2019 11/29/2019 11/30/2019 12/01/2019 12/02/2019
## 168 128 126 199 127 123 168
## 12/03/2019 12/04/2019 12/05/2019 12/06/2019 12/07/2019 12/08/2019 12/09/2019
## 126 126 187 120 120 167 117
## 12/10/2019 12/11/2019 12/12/2019 12/13/2019 12/14/2019 12/15/2019 12/16/2019
## 119 196 122 117 169 121 120
## 12/17/2019 12/18/2019 12/19/2019 12/20/2019 12/21/2019 12/22/2019 12/23/2019
## 186 125 129 174 129 128 203
## 12/24/2019 12/25/2019 12/26/2019 12/27/2019 12/28/2019 12/29/2019 12/30/2019
## 129 129 165 123 123 191 120
## 12/31/2019
## 126
table(data19$UNITS)
##
## ug/m3 LC
## 53086
table(data19$STATE)
##
## California
## 53086
summary(data19$`Daily Mean PM2.5 Concentration`)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.200 4.000 6.500 7.734 9.900 120.900
data19[`Daily Mean PM2.5 Concentration`<0][order(`Daily Mean PM2.5 Concentration`)]
## Date Source Site ID POC Daily Mean PM2.5 Concentration UNITS
## 1: 03/16/2019 AQS 60130002 3 -2.2 ug/m3 LC
## 2: 03/02/2019 AQS 60611004 3 -2.0 ug/m3 LC
## 3: 03/05/2019 AQS 60611004 3 -2.0 ug/m3 LC
## 4: 03/06/2019 AQS 60611004 3 -2.0 ug/m3 LC
## 5: 03/07/2019 AQS 60611004 3 -2.0 ug/m3 LC
## ---
## 278: 02/18/2019 AQS 60832011 1 -0.1 ug/m3 LC
## 279: 04/05/2019 AQS 60832011 1 -0.1 ug/m3 LC
## 280: 12/26/2019 AQS 61110007 3 -0.1 ug/m3 LC
## 281: 01/06/2019 AQS 61110009 3 -0.1 ug/m3 LC
## 282: 01/07/2019 AQS 61110009 3 -0.1 ug/m3 LC
## DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
## 1: 0 Concord 1 100
## 2: 0 Tahoe City-Fairway Drive 1 100
## 3: 0 Tahoe City-Fairway Drive 1 100
## 4: 0 Tahoe City-Fairway Drive 1 100
## 5: 0 Tahoe City-Fairway Drive 1 100
## ---
## 278: 0 Goleta 1 100
## 279: 0 Goleta 1 100
## 280: 0 Thousand Oaks 1 100
## 281: 0 Piru - Pacific 1 100
## 282: 0 Piru - Pacific 1 100
## AQS_PARAMETER_CODE AQS_PARAMETER_DESC CBSA_CODE
## 1: 88101 PM2.5 - Local Conditions 41860
## 2: 88502 Acceptable PM2.5 AQI & Speciation Mass 40900
## 3: 88502 Acceptable PM2.5 AQI & Speciation Mass 40900
## 4: 88502 Acceptable PM2.5 AQI & Speciation Mass 40900
## 5: 88502 Acceptable PM2.5 AQI & Speciation Mass 40900
## ---
## 278: 88101 PM2.5 - Local Conditions 42200
## 279: 88101 PM2.5 - Local Conditions 42200
## 280: 88101 PM2.5 - Local Conditions 37100
## 281: 88101 PM2.5 - Local Conditions 37100
## 282: 88101 PM2.5 - Local Conditions 37100
## CBSA_NAME STATE_CODE STATE COUNTY_CODE
## 1: San Francisco-Oakland-Hayward, CA 6 California 13
## 2: Sacramento--Roseville--Arden-Arcade, CA 6 California 61
## 3: Sacramento--Roseville--Arden-Arcade, CA 6 California 61
## 4: Sacramento--Roseville--Arden-Arcade, CA 6 California 61
## 5: Sacramento--Roseville--Arden-Arcade, CA 6 California 61
## ---
## 278: Santa Maria-Santa Barbara, CA 6 California 83
## 279: Santa Maria-Santa Barbara, CA 6 California 83
## 280: Oxnard-Thousand Oaks-Ventura, CA 6 California 111
## 281: Oxnard-Thousand Oaks-Ventura, CA 6 California 111
## 282: Oxnard-Thousand Oaks-Ventura, CA 6 California 111
## COUNTY SITE_LATITUDE SITE_LONGITUDE
## 1: Contra Costa 37.93601 -122.0262
## 2: Placer 39.16602 -120.1488
## 3: Placer 39.16602 -120.1488
## 4: Placer 39.16602 -120.1488
## 5: Placer 39.16602 -120.1488
## ---
## 278: Santa Barbara 34.44551 -119.8284
## 279: Santa Barbara 34.44551 -119.8284
## 280: Ventura 34.21017 -118.8705
## 281: Ventura 34.40428 -118.8100
## 282: Ventura 34.40428 -118.8100
data19rm <- data19[`Daily Mean PM2.5 Concentration`>=0]
mean(is.na(data19$`Daily Mean PM2.5 Concentration`))
## [1] 0
summary(data19$SITE_LATITUDE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 32.58 34.14 36.63 36.35 37.97 41.76
summary(data19$SITE_LONGITUDE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -124.2 -121.6 -119.8 -119.8 -118.1 -115.5
Summary: There is no missing value in the key varaibles we are analyzing. There are some negative values in daily mean PM2.5 concentration. The mean, median and maximum value of daily mean PM2.5 concentration in California decreased from 2004 to 2019. Throughout the 2004, the daily mean PM2.5 concentration in California has a mean value of 13.13 ug/m3 and median value of 10.10 ug/m3 with the range from 0 to 251 ug/m3. Throughout the 2019, the daily mean PM2.5 concentration in California has a mean value of 7.78 ug/m3 and median value of 6.5 ug/m3 with the range from 0 to 120.90 ug/m3.
#Step 2: combine and organize the data
#Combine the two years of data into one data frame.
total <- rbind(data04, data19)
totalrm <- rbind(data04rm, data19rm)
#Use the Date variable to create a new column for year, which will serve as an identifier.
totalrm$Date <- as.POSIXct(totalrm$Date, format = "%m/%d/%Y")
totalrm$year <- format(totalrm$Date, format = "%Y")
table(totalrm$year)
##
## 2004 2019
## 19232 52804
#Change the names of the key variables so that they are easier to refer to in your code.
names(totalrm)[names(totalrm) == "Daily Mean PM2.5 Concentration"] <- "PM2.5"
names(totalrm)[names(totalrm) == "SITE_LATITUDE"] <- "lat"
names(totalrm)[names(totalrm) == "SITE_LONGITUDE"] <- "lon"
names(totalrm)[names(totalrm) == "Site Name"] <- "site"
#Step 3: Create a basic map in leaflet() that shows the locations of the sites (make sure to use different colors for each year).
library(leaflet)
#Generating a color palette
year.pal <- colorFactor(c('darkgreen', 'goldenrod'), domain=totalrm$year)
#Map
leaflet(totalrm) %>%
addProviderTiles('CartoDB.Positron') %>%
addCircles(lat=~lat, lng=~lon, color=~year.pal(year))
Summarize the spatial distribution of the monitoring sites.
Summary: The monitoring sites location scatter from North to South in California. They pretty much overlap with each other in 2004 and 2019.
#Step 4: Check for any missing or implausible values of PM2.5 in the combined dataset. Explore the proportions of each and provide a summary of any temporal patterns you see in these observations.
mean(is.na(totalrm$PM2.5))
## [1] 0
summary(total$PM2.5)
## Length Class Mode
## 0 NULL NULL
mean(total$PM2.5<0)
## [1] NaN
There is no missing value of PM2.5. However, there is implausible values less than 0 of PM2.5. The proportion of the implausible values is 0.39%.
#Step 5: Explore the main question of interest at three different spatial levels. Create exploratory plots (e.g. boxplots, histograms, line plots) and summary statistics that best suit each level of data. Be sure to write up explanations of what you observe in these data.
library(ggplot2)
library(dplyr)
#state
ggplot(totalrm) +
geom_boxplot(mapping = aes(x=year, y=PM2.5))
#county
ggplot(data=totalrm[totalrm$PM2.5<100]) +
geom_boxplot(mapping = aes(x=year, y=PM2.5))+
facet_wrap(~COUNTY, scale="free")
#site in Los Angeles
ggplot(data=totalrm[totalrm$PM2.5<100 & totalrm$COUNTY=="Los Angeles"]) +
geom_boxplot(mapping = aes(x=year, y=PM2.5)) +
facet_wrap(~site, scale="free")
Overall, the PM2.5 level in 2004 is higher than in 2019 in state, county, and site levels. We can conclude that the PM2.5 decreased from 2004 to 2019.